Distributed Subgroup Mining
نویسندگان
چکیده
Subgroup discovery is a popular form of supervised rule learning, applicable to descriptive and predictive tasks. In this work we study two natural extensions of classical subgroup discovery to distributed settings. In the first variant the goal is to efficiently identify global subgroups, i.e. the rules an analysis would yield after collecting all the data at a single central database. In contrast, the second considered variant takes the locality of data explicitly into account. The aim is to findpatterns that point out major differences between individual databases with respect to a specific property of interest (target attribute).Wepoint out substantial differences between these novel learning problems and other kinds of distributed data mining tasks. These differences motivate new search and communication strategies, aiming at a minimization of computation time and communication costs. We present and empirically evaluate new algorithms for both considered variants.
منابع مشابه
Secure Distributed Subgroup Discovery in Horizontally Partitioned Data
Supervised descriptive rule discovery techniques like subgroup discovery are quite popular in applications like fraud detection or clinical studies. Compared with other descriptive techniques, like classical support/confidence association rules, subgroup discovery has the advantage that it comes up with only the top-k patterns, and that it makes use of a quality function that avoids patterns un...
متن کاملSecure Top-k Subgroup Discovery
Supervised descriptive rule discovery techniques like subgroup discovery are quite popular in applications like fraud detection or clinical studies. Compared with other descriptive techniques, like classical support/confidence association rules, subgroup discovery has the advantage that it comes up with only the top-k patterns, and that it makes use of a quality function that avoids patterns un...
متن کاملSecure Privacy Preserving Mining of Association Rule in Horizontally Distributed Databases
Data mining is used to extract important knowledge from large datasets, but sometimes these datasets are split among various parties. Association rule mining is one of the data mining technique used in distributed databases. This technique disclose some interesting relationship between locally large and globally large item sets and proposes an algorithm, fast distributed mining of association r...
متن کاملSemantic Subgroup Discovery Systems and Workflows in the SDM-Toolkit
This paper addresses semantic data mining, a new data mining paradigm in which ontologies are exploited in the process of data mining and knowledge discovery. This paradigm is introduced together with new semantic subgroup discovery systems SDM-search for enriched gene sets (SEGS) and SDM-Aleph. These systems are made publicly available in the new SDM-Toolkit for semantic data mining. The toolk...
متن کاملFast Discovery of Relevant Subgroup Patterns
Subgroup discovery is a prominent data mining method for discovering local patterns. Since often a set of very similar, overlapping subgroup patterns is retrieved, efficient methods for extracting a set of relevant subgroups are required. This paper presents a novel algorithm based on a vertical data structure, that not only discovers interesting subgroups quickly, but also integrates efficient...
متن کامل